linear time
Interlaced Greedy Algorithm for Maximization of Submodular Functions in Nearly Linear Time
A deterministic approximation algorithm is presented for the maximization of non-monotone submodular functions over a ground set of size $n$ subject to cardinality constraint $k$; the algorithm is based upon the idea of interlacing two greedy procedures. The algorithm uses interlaced, thresholded greedy procedures to obtain tight ratio $1/4 - \epsilon$ in $O \left( \frac{n}{\epsilon} \log \left( \frac{k}{\epsilon} \right) \right)$ queries of the objective function, which improves upon both the ratio and the quadratic time complexity of the previously fastest deterministic algorithm for this problem. The algorithm is validated in the context of two applications of non-monotone submodular maximization, on which it outperforms the fastest deterministic and randomized algorithms in prior literature.
Principal Component Projection and Regression in Nearly Linear Time through Asymmetric SVRG
Given a n-by-d data matrix A, principal component projection (PCP) and principal component regression (PCR), i.e. projection and regression restricted to the top-eigenspace of A, are fundamental problems in machine learning, optimization, and numerical analysis. In this paper we provide the first algorithms that solve these problems in nearly linear time for fixed eigenvalue distribution and large n. This improves upon previous methods which had superlinear running times when either the number of top eigenvalues or gap between the eigenspaces were large. We achieve our results by applying rational polynomial approximations to reduce the problem to solving asymmetric linear systems which we solve by a variant of SVRG. We corroborate these findings with preliminary empirical experiments.
Frank-Wolfe-based Algorithms for Approximating Tyler's M-estimator
Tyler's M-estimator is a well known procedure for robust and heavy-tailed covariance estimation. Tyler himself suggested an iterative fixed-point algorithm for computing his estimator however, it requires super-linear (in the size of the data) runtime per iteration, which maybe prohibitive in large scale. In this work we propose, to the best of our knowledge, the first Frank-Wolfe-based algorithms for computing Tyler's estimator. One variant uses standard Frank-Wolfe steps, the second also considers \textit{away-steps} (AFW), and the third is a \textit{geodesic} version of AFW (GAFW). AFW provably requires, up to a log factor, only linear time per iteration, while GAFW runs in linear time (up to a log factor) in a large $n$ (number of data-points) regime. All three variants are shown to provably converge to the optimal solution with sublinear rate, under standard assumptions, despite the fact that the underlying optimization problem is not convex nor smooth. Under an additional fairly mild assumption, that holds with probability 1 when the (normalized) data-points are i.i.d.
Deterministic Approximation for Submodular Maximization over a Matroid in Nearly Linear Time
We study the problem of maximizing a non-monotone, non-negative submodular function subject to a matroid constraint. The prior best-known deterministic approximation ratio for this problem is $\frac{1}{4}-\epsilon$ under $\mathcal{O}(({n^4}/{\epsilon})\log n)$ time complexity. We show that this deterministic ratio can be improved to $\frac{1}{4}$ under $\mathcal{O}(nr)$ time complexity, and then present a more practical algorithm dubbed TwinGreedyFast which achieves $\frac{1}{4}-\epsilon$ deterministic ratio in nearly-linear running time of $\mathcal{O}(\frac{n}{\epsilon}\log\frac{r}{\epsilon})$. Our approach is based on a novel algorithmic framework of simultaneously constructing two candidate solution sets through greedy search, which enables us to get improved performance bounds by fully exploiting the properties of independence systems. As a byproduct of this framework, we also show that TwinGreedyFast achieves $\frac{1}{2p+2}-\epsilon$ deterministic ratio under a $p$-set system constraint with the same time complexity. To showcase the practicality of our approach, we empirically evaluated the performance of TwinGreedyFast on two network applications, and observed that it outperforms the state-of-the-art deterministic and randomized algorithms with efficient implementations for our problem.
How Hard is it to Explain Preferences Using Few Boolean Attributes?
Anzinger, Clemens, Chen, Jiehua, Hatschka, Christian, Sorge, Manuel, Temper, Alexander
We study the computational complexity of explaining preference data through Boolean attribute models (BAMs), motivated by extensive research involving attribute models and their promise in understanding preference structure and enabling more efficient decision-making processes. In a BAM, each alternative has a subset of Boolean attributes, each voter cares about a subset of attributes, and voters prefer alternatives with more of their desired attributes. In the BAM problem, we are given a preference profile and a number k, and want to know whether there is a Boolean k-attribute model explaining the profile. We establish a complexity dichotomy for the number of attributes k: BAM is linear-time solvable for $k \le 2$ but NP-complete for $k \ge 3$. The problem remains hard even when preference orders have length two. On the positive side, BAM becomes fixed-parameter tractable when parameterized by the number of alternatives m. For the special case of two voters, we provide a linear-time algorithm. We also analyze variants where partial information is given: When voter preferences over attributes are known (BAM WITH CARES) or when alternative attributes are specified (BAM WITH HAS), we show that for most parameters BAM WITH CARES is more difficult whereas BAM WITH HAS is more tractable except for being NP-hard even for one voter.
- Europe > Austria > Vienna (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Kosovo > District of Gjilan > Kamenica (0.04)
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Near-optimal density estimation in near-linear time using variable-width histograms Given a univariate distribution with support on the unit interval, this paper develops a method for estimating the density of the distribution via a histogram. The histogram has the following features: the bins are data-dependent, the bins are determined efficiently (near linear time), the number of bins k is specified by the user, and the estimator satisfies a probably approximately correct type performance guarantee. The paper is entirely theoretical. Pros: The writing is clear.
Top Rank Optimization in Linear Time
Bipartite ranking aims to learn a real-valued ranking function that orders positive instances before negative instances. Recent efforts of bipartite ranking are focused on optimizing ranking accuracy at the top of the ranked list. Most existing approaches are either to optimize task specific metrics or to extend the rank loss by emphasizing more on the error associated with the top ranked instances, leading to a high computational cost that is super-linear in the number of training instances. We propose a highly efficient approach, titled TopPush, for optimizing accuracy at the top that has computational complexity linear in the number of training instances. We present a novel analysis that bounds the generalization error for the top ranked instances for the proposed approach. Empirical study shows that the proposed approach is highly competitive to the state-of-the-art approaches and is 10-100 times faster.
Linear-Time Primitives for Algorithm Development in Graphical Causal Inference
Wienöbst, Marcel, Weichwald, Sebastian, Henckel, Leonard
We introduce CIfly, a framework for efficient algorithmic primitives in graphical causal inference that isolates reachability as a reusable core operation. It builds on the insight that many causal reasoning tasks can be reduced to reachability in purpose-built state-space graphs that can be constructed on the fly during traversal. We formalize a rule table schema for specifying such algorithms and prove they run in linear time. We establish CIfly as a more efficient alternative to the common primitives moralization and latent projection, which we show are computationally equivalent to Boolean matrix multiplication. Our open-source Rust implementation parses rule table text files and runs the specified CIfly algorithms providing high-performance execution accessible from Python and R. We demonstrate CIfly's utility by re-implementing a range of established causal inference tasks within the framework and by developing new algorithms for instrumental variables.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Virginia (0.04)
- (3 more...)